gossip-based actor-learner architecture
- North America > Canada > Quebec > Montreal (0.04)
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
Gossip-based Actor-Learner Architectures for Deep Reinforcement Learning
Multi-simulator training has contributed to the recent success of Deep Reinforcement Learning (Deep RL) by stabilizing learning and allowing for higher training throughputs. In this work, we propose Gossip-based Actor-Learner Architectures (GALA) where several actor-learners (such as A2C agents) are organized in a peer-to-peer communication topology, and exchange information through asynchronous gossip in order to take advantage of a large number of distributed simulators. We prove that GALA agents remain within an epsilon-ball of one-another during training when using loosely coupled asynchronous communication. By reducing the amount of synchronization between agents, GALA is more computationally efficient and scalable compared to A2C, its fully-synchronous counterpart. GALA also outperforms A2C, being more robust and sample efficient. We show that we can run several loosely coupled GALA agents in parallel on a single GPU and achieve significantly higher hardware utilization and frame-rates than vanilla A2C at comparable power draws.
- North America > Canada > Quebec > Montreal (0.04)
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Reviews: Gossip-based Actor-Learner Architectures for Deep Reinforcement Learning
The introduction of gossip algorithms to Deep-RL is original. The work is generally clearly presented, but some of the reported baseline results do not match previous published works. Figure 1: The IMPALA results look completely off, as do the A3C results on pong, and the A3C results in the appendix. There shouldn't be such a discrepancy between A3C and IMPALA when running with the same hyperparameters (there is a larger discrepancy on some games, but not these ones). I suspect a bug in the IMPALA implementation, or at least an unfair comparison due to all other results using a more recent (and hence more tuned) set of hyperparameters from [Stooke&Abbeel 2018].
Reviews: Gossip-based Actor-Learner Architectures for Deep Reinforcement Learning
The paper introduces gossip based algorithms for consensus and community managing asynchronous updates in distributed Deep RL. The reviewers had some concerns regarding the comparison of the proposed approach to baselines (and the choice thereof), but were overall impressed by the empirical results, that show a more computational efficient algorithm (compared to A2C/A3C) with no significant loss in learning performance. Please take into account the detailed comments of the reviewers (especially R2 and R3) when preparing the final version.
Gossip-based Actor-Learner Architectures for Deep Reinforcement Learning
Multi-simulator training has contributed to the recent success of Deep Reinforcement Learning (Deep RL) by stabilizing learning and allowing for higher training throughputs. In this work, we propose Gossip-based Actor-Learner Architectures (GALA) where several actor-learners (such as A2C agents) are organized in a peer-to-peer communication topology, and exchange information through asynchronous gossip in order to take advantage of a large number of distributed simulators. We prove that GALA agents remain within an epsilon-ball of one-another during training when using loosely coupled asynchronous communication. By reducing the amount of synchronization between agents, GALA is more computationally efficient and scalable compared to A2C, its fully-synchronous counterpart. GALA also outperforms A2C, being more robust and sample efficient.
Gossip-based Actor-Learner Architectures for Deep Reinforcement Learning
Assran, Mahmoud (", Mido", ), Romoff, Joshua, Ballas, Nicolas, Pineau, Joelle, Rabbat, Mike
Multi-simulator training has contributed to the recent success of Deep Reinforcement Learning (Deep RL) by stabilizing learning and allowing for higher training throughputs. In this work, we propose Gossip-based Actor-Learner Architectures (GALA) where several actor-learners (such as A2C agents) are organized in a peer-to-peer communication topology, and exchange information through asynchronous gossip in order to take advantage of a large number of distributed simulators. We prove that GALA agents remain within an epsilon-ball of one-another during training when using loosely coupled asynchronous communication. By reducing the amount of synchronization between agents, GALA is more computationally efficient and scalable compared to A2C, its fully-synchronous counterpart. GALA also outperforms A2C, being more robust and sample efficient.
Gossip-based Actor-Learner Architectures for Deep Reinforcement Learning
Assran, Mahmoud, Romoff, Joshua, Ballas, Nicolas, Pineau, Joelle, Rabbat, Michael
Multi-simulator training has contributed to the recent success of Deep Reinforcement Learning (Deep RL) by stabilizing learning and allowing for higher training throughputs. In this work, we propose Gossip-based Actor-Learner Architectures (GALA) where several actor-learners (such as A2C agents) are organized in a peer-to-peer communication topology, and exchange information through asynchronous gossip in order to take advantage of a large number of distributed simulators. We prove that GALA agents remain within an epsilon-ball of one-another during training when using loosely coupled asynchronous communication. By reducing the amount of synchronization between agents, GALA is more computationally efficient and scalable compared to A2C, its fully-synchronous counterpart. GALA also outperforms A2C, being more robust and sample efficient. We show that we can run several loosely coupled GALA agents in parallel on a single GPU and achieve significantly higher hardware utilization and frame-rates than vanilla A2C at comparable power draws.
- North America > Canada > Quebec > Montreal (0.04)
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Gossip-based Actor-Learner Architectures for Deep Reinforcement Learning
Assran, Mahmoud, Romoff, Joshua, Ballas, Nicolas, Pineau, Joelle, Rabbat, Mike
Multi-simulator training has contributed to the recent success of Deep Reinforcement Learning by stabilizing learning and allowing for higher training throughputs. We propose Gossip-based Actor-Learner Architectures (GALA) where several actor-learners (such as A2C agents) are organized in a peer-to-peer communication topology, and exchange information through asynchronous gossip in order to take advantage of a large number of distributed simulators. We prove that GALA agents remain within an epsilon-ball of one-another during training when using loosely coupled asynchronous communication. By reducing the amount of synchronization between agents, GALA is more computationally efficient and scalable compared to A2C, its fully-synchronous counterpart. GALA also outperforms A2C, being more robust and sample efficient. We show that we can run several loosely coupled GALA agents in parallel on a single GPU and achieve significantly higher hardware utilization and frame-rates than vanilla A2C at comparable power draws.